What's in a Name? 
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Among the several findings deriving from the application of complex network formalism to the 
investigation of natural phenomena, the fact that linguistic constructions follow power laws presents 
special interest for its potential implications for psychology and brain science. By corresponding to 
one of the most essentially human manifestations, such language-related properties suggest that sim- 
ilar dynamics may also be inherent to the brain areas related to language and associative memory, 
and perhaps even consciousness. The present work reports a preliminary experimental investiga- 
tion aimed at characterizing and modeling the flow of sequentially induced associations between 
words from the English language in terms of complex networks. The data is produced through a 
psychophysical experiment where a word is presented to the subject, who is requested to associate 
another word. Complex network and graph theory formalism and measurements are applied in 
order to characterize the experimental data. Several interesting results are identified, including the 
characterization of attraction basins, association asymmetries, context biasing, as well as a possible 
power-law underlying word associations, which could be explained by the appearance of strange 
loops along the hierarchical structure underlying word categories. 

PACS numbers: 



'. . . that which we call a rose 

By any other name would smell as sweet' 

(Romeo and Juliet, ACT II) 



I. INTRODUCTION 

Despite its long tradition in mathematics and com- 
puter science, graph theory Q has reached great popular- 
ity only recently through innovative research in the novel 
area which became known as complex networks 0- By 
integrating theoretical principles, especially from statis- 
tical mechanics, with experimental and simulated data, 
recent investigations have shown that several important 
natural phenomena such as infectious diseases, ecologi- 
cal systems, protein folding, society and the internet, are 
characterized by scale- free and/or small- world behavior 
as far as their connectivity is concerned 2]. In particular, 
studies modeling linguistics in terms of complex networks 
have indicated that several aspects of human language, 
such as word proximity [j| and synonyms 0, El , are at 
least partially characterized by power law behavior. As 
language corresponds to one of the most essential mani- 
festations of the human brain, such findings can be taken 
as an indication that that complex structure, or at least 
its portions more closely related to language and associ- 
ations, may also be intrinsically organized according to 
power laws and scale free behavior [J. As the conscious 
and predominantly sequential flow of ideas in humans, 
James's fringe of consciousness jfj, is closely related to 
the externalization of ideas through language, it is also 
possible that the scale free properties found in linguistic 
structures can also be an intrinsic property of conscious- 
ness. 

The current work aims at investigating such possibil- 



ities through a psychophysical experiment involving hu- 
man subjects to associate words from the English lan- 
guage. By understanding the presented words and associ- 
ations as graph nodes and edges, respectively, it is possi- 
ble to perform a quantitative analysis of the digraph con- 
nections by considering statistics (average and standard 
deviation) of network measurements such as the node de- 
gree, the average length, and the clustering coefficient. 

The current article starts by describing the experimen- 
tal approach and proceeds by analysing and discussing 
the respectively obtained data. 



II. THE PSYCHOPHYSICAL EXPERIMENT 

Along the last decades, psychophysics has establish- 
ing itself as an important area in psychology and neuro- 
science, providing invaluable means for quantifying per- 
ception. Provided the experiments are carefully devised 
and conducted, objective and relatively precise informa- 
tion can be obtained about the dynamics of perception. 
As in physics, the experiments have to be planned and 
performed while most factors likely to influence the in- 
vestigated phenomena are kept constant. The popular- 
ization of personal computers has motivated the use of 
such machines for automating of psychophysical exper- 
iments, accounting for enhance repeatability and stan- 
dardization. 

In this work, a program was developed in SCILAB with 
the specific finality of investigating associations between 
words from the English language. Starting with the word 
'sun', the subject is prompted to associate a subsequent 
word. No specific instructions are given regarding the 
type of association, except that special characters, plu- 
rals and verb conjugations are to be completely avoided. 
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There is no time limit for providing the new word, and 
the experiment can be broken into several sections, while 
collecting all the obtained data stored into files. An il- 
lustration of the first steps of the experiment is provided 
below, where the words supplied by the subject are rep- 
resented in italic: 

sun i— ► desert 
desert pyramid 
sun i — ► gold 
pyramid i— > triangle 
pyramid i— > desert 
trianglei— > square 



Observe that the only predefined word is that pre- 
sented first, all the others being subsequently defined 
by the subject. The presented and suggested words are 
henceforth referred to as presented word and input word, 
respectively. After each new word is input, its presence 
in the current list of words is verified, the word being in- 
cluded otherwise. Each word is treated as a graph node, 
and each pair of words is understood as a graph edge 
(presented word, input word). The therefore obtained 
direct graph (i.e. a digraph), with the frequency of each 
association treated as the weight of the respective edge, 
provides an interesting formal representation of the word 
associations. The whole sequence of presented and input 
words is recorded for further analysis. In order to guaran- 
tee the words to be presented in a uniform fashion, in the 
sense that each word is presented about as many times as 
the others, a density probability function p(w) describ- 
ing the number of times each word is presented is kept 
all times. The presented words are drawn from the com- 
plemented density function, i.e. max{p(w)} — p(w), so 
that the less frequently presented words have higher like- 
lihood to be chosen, leading to a levelling effect. The ex- 
periment terminates after a pre-defined number of words 
are presented, and the more recent input words, which 
have consequently been presented only a few times, are 
excluded from the data and respective network. 



III. RESULTS 

The above experiment was performed with a single 
subject along a whole week, totaling 305 different words 
from which 250 words were chosen (the remainder, more 
recently input words, were discarded for the sake of en- 
hanced uniformity). A total of 1930 associations were 
recorded. The types of the input words is given in Ta- 
ble |U and Table [H] shows the frequency of types of asso- 
ciations. Figure ^ presents the population of each input 
word, and Figure [21 depicts the occurrence of new words 
along the presentation stages identified by i. Figure |3| 
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TABLE I: Total of words by category. 
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TABLE II: Number of associations by category. 



gives the histogram of repeated associations. The aver- 
age and standard deviation of the node degree fc, clus- 
tering coefficient C and average length £ are presented in 
Table IIIII Figure 0] shows the histogram of equal words 
apart by specific lags along the presentation sequence. 
For instance, the ordinate value at lag 100 indicates the 
number of equal words distant of lag along the sequence. 
For the sake of enhanced uniformity, the sequence is con- 
sidered up to its total length minus the maximum lag 
value. The loglog curves of the cumulative output and 
input node degree (recall that we are dealing with a di- 
graph) are presented in Figures and 



IV. DISCUSSION 

The several interesting trends and phenomena identi- 
fied by analysis of the experimental data are character- 
ized and briefly discussed in the following: 

A. Attractor formation: As shown in Figure [21 the 
number of new words input by the user tended to dimin- 
ish, reaching a near equilibrium state where very few new 
words are likely to be added. This suggests an attraction 
basin defined by the initial word. 

B. Word density asymetry: As illustrated in Figure ^ 



population 



Input word density 




FIG. 1: The histogram of input words. 
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FIG. 2: The occurrence of new words along the presentation 
sequence, indexed by i. 




weight 



1 3 5 7 9 11 13 

FIG. 3: Histogram of weights, i.e. the number of times spe- 
cific associated pairs of words were produced during the ex- 
periment. 



k 


7.72 ± 10.93 


c 


0.075 ± 0.17 


e 


3.32 ± 0.95 



TABLE III: The node degree k, clustering coefficient C and 
average length t for the network obtained in the psychophys- 
ical experiment. 
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FIG. 4: The population of equal input words distant one an- 
other by specific lag values. 



Presented word dilog function 




FIG. 5: Loglog representation of the cumulative output node 
degree. 



Input word dilog function 




FIG. 6: Loglog representation of the cumulative input node 
degree. An approximatedly straight region is observed along 
the lefthand side of the graph. 



the subject tended to enter some words more often than 
others. Particularly, there was a generalized preference 
for adjectives such as good and long, among others. This 
is hardly surprising, as adjectives are more immediately 
applicable to several words. 

C. Edges asymmetry: As clearly seem from Table ITU 
not every association is reciprocal, i.e. the existence 
of an edge does not necessarily implies the pres- 
ence of (j, i). Examples of such asymmetric cases ob- 
tained in the considered experiment include (sky, blue) 
and (blue, sky). To some extent, such asymmetries are 
observed in cases involving a more common word fol- 
lowed by a less common one, such as a general adjec- 
tive and a specific noun. Another characteristic that has 
been verified from the experiment is the tendency of the 
associations to correspond to synonyms and antonyms, 
especially regarding pairs of adjectives. 

D. Wide variation of node degree: The high standard 
deviation obtained for the node degree indicates that the 
number of associations induced by each presented word 
varies considerably. It is possible that more common 
words which usually appear connected to several other 
words, such as adjectives, tend to favor higher number of 
associations. 

E. Associations asymmetry: One of the clearest re- 
sults deriving from the reported experiment was the fact 
that some associations tended to be much more stable 
than others, in the sense that they were more system- 
atically repeated and yielded a smaller number of varia- 
tions. Examples of such cases include (bread, butter) and 
(pecker, wood) . This property seems to be connected to 
the node degree wide dispersion, in the sense that associ- 
ation pairs involving at least one word characterized by 
higher node degree tended to favor a higher number of 
different associations. 

F. Context biasing: As is clear from Figure ^ the 
choice of a word by the subject tended to be influenced 
by those more recently input. The memory effect seems 
to disappear for lags higher than 250 presentations. 

G. Small-world features The relatively low average 
length shown in Table ITTT1 suggests that the obtained as- 
sociation follows the small-world paradigm. This is an 
immediate consequence of the fact that the experiment 
inherently targets word associations. 
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H. Power-law features: As could be expected, the out- 
put and input degree distributions shown in Figures [S] 
and HO resulted markedly different, with the latter being 
more compatible with power-law scaling, especially at the 
initial portion of the curve. 

As the limited power-law trend corresponds to the pos- 
sibly more complex and interesting identified features, it 
is further discussed in the following, including a possi- 
ble explanation. Although difficult to be defined, the 
conscious portion of thinking is a predominantly sequen- 
tial process. While solving a problem, or just relaxing, 
the flow of ideas and concepts takes place as a sequence 
of ideas associated in some way which is highly depen- 
dent on the context defined by the more recent thoughts. 
To a large extent, the successive ideas are characterized 
by some strong or weak association. For instance, after 
thinking about the sky, next possible ideas are likely to 
be blue, air, sun or clouds. Therefore, at least part of the 
flow of thoughts can be thought in terms of a Markovian 
system. At the same time, memories are often related to 
associations 6]. From the computational point of view, 
it is possible to enhance the storage potential by orga- 
nizing the stored concepts in a hierarchical fashion, so 
that the description of new concepts at lower hierarchi- 
cal levels can include only the features not covered by 
the upper levels. For instance, the description of a cat 
can be derived from that of mammals, including only 
those characteristics that are intrinsic to cats (see Fig- 
ured). This concept of inheritance leads naturally to as- 
sociations between concepts and ideas, even between two 
non-adjacent hierarchical levels, a phenomenon that can 
be related to Hofstadter 'strange loops' Q. Though ad- 
ditional features are certainly incorporated into the brain 
dynamics, such hierarchical and associative schemes lead 
to the interesting situation where several concepts end 
up associated, even if indirectly, to those in the upper 
hierarchies. Consequently, the concepts tend to become 
more and more associated as one moves from the lower 
to the upper hierarchical levels, possibly leading to a rich 
gets richer scheme, and hence to scale free organization. 

Given that the adjacency matrix of the obtained graph 
can be immediately understood as the transition matrix 
of a Markovian systems, it is possible to use Monte Carlo 
simulation in order to produce sequences of associated 
words, such as that illustrated below. As the context 
is limited to one association level, such sequences are 
characterized by subsequent repetitions of words. 

horse, brown, bear, brown, sugar, sweet, 
good, earth, land, good, well, good, time, 
out, sun, hot, water, cold, water, cold, 
wool, sheep, four, clock, six, tea, leaf, 
thin, sheet, wide, field 



V. CONCLUDING REMARKS 

The present work has illustrated how complex network 
and graph theory concepts and formalisms can be applied 
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FIG. 7: Hierarchical organization of words and formation of 
strange loops imply that words in higher hierarchical levels 
acquire higher number of associations. 



to characterize human cognitive activities, namely the as- 
sociation of words. While previous related works such as 
Motter et al.Q investigated word associations through 
the use of static databases, the current approach consid- 
ered psychophysical experiments. The main differences 
implied by such an approach are the fact that the impor- 
tance of associations can be inferred from the respective 
frequencies. In addition, a random element is implied by 
the fact that the user is likely to vary the chosen asso- 
ciations while affected by the context established by the 
presentation sequence. 

Although limited to a single subject, the obtained ex- 
perimental results led to a series of interesting findings, 
including the identification of attraction basins, context 
biasing, association asymmetries, small-world features, 
and near power-law scaling of the node degree. A pu- 
tative model possibly underlying the latter phenomenon, 
involving the appearance of strange loops in the hierar- 
chical categorization of words, has also been proposed. 
While extensive additional investigations are required in 
order to confirm such preliminary results, it is felt that 
the identified phenomena arc likely to provide a reason- 
ably formal scaffolding for further investigating and un- 
derstanding word associations by humans and even more 
sophisticated brain dynamics [j]. 

Several are the possibilities implied by the reported 
developments. First, it is important to note that the 
specific measurements extracted from digraphs obtained 
from different subjects can be possibly correlated to indi- 
vidual features or even for diagnosis. At the same time, 
it is likely that the obtained graphs will present a core 
shared by several subjects, corresponding to those more 
established and invariant collective concepts, while the 
graph difference residuals could provide interesting in- 
formation about intrinsic individual features and prefer- 
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ences. Another interesting task would be to extend the 
reported approach in order to investigate associations in 
visual language, for instance by using eye-tracking sys- 
tems. Several possibilities for further investigation can be 
defined by considering modified versions of the adopted 
psychophysical experiment. For instance, it would be in- 
teresting to study situations where the subject is allowed 
to enter a continuous flow of associated words, without 
any interference from the computer, except the presenta- 
tion of the first word. Although more complex, given the 
additional degrees of freedom, such investigations could 
provide additional insights about long time memory ef- 
fects, which are poised to reduce the number of word 



repetitions in respective Monte Carlo simulations. It 
would be interesting to compare how such extended con- 
text modifies the properties of the respectively obtained 
networks. 
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